Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement open telemetry integration - client/server and explict token latency #296

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ppalucki
Copy link

@ppalucki ppalucki commented Aug 12, 2024

Description

Integrate OpenTelemetry (metrics only) and expose it using Promteheus exporter for:

  1. Server (incomming connection) (request/reponse size, duration) with http_route label for each handler:
    • http_server_duration_milliseconds (histogram: bucket/total/sum)
    • http_server_request_size_bytes_total (counter)
    • http_server_response_size_bytes_total (counter)
  2. Client (all outgoing connection to services) with net_peer_name label to distinguish target:
    • http_client_duration_milliseconds (histogram: bucket/total/sum)
    • http_client_request_size_bytes_total (counter)
    • http_client_response_size_bytes_total (counter)
  3. Explict Float64 historgrams for first and all next tokens lantecy:
    • llm_first_token_latency_milliseconds (histogram: bucket/total/sum)
    • llm_next_token_latency_milliseconds (histogram: bucket/total/sum)
    • llm_all_token_latency_milliseconds (histogram: bucket/total/sum)
    • llm_pipeline_latency_milliseconds (histogram: bucket/total/sum)
  4. Other default Go language metrics like: go_gc_duration_seconds, go_goroutines and others.

Additional feature is integrated opentelemetry trace context propagation (not tested).

Metrics are exposed on /metric with Prometheus handler. Example output for metrics:

1) server metrics example

# HELP http_server_duration_milliseconds Measures the duration of inbound HTTP requests.
# TYPE http_server_duration_milliseconds histogram
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="0"} 0
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="5"} 0
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="+Inf"} 2
http_server_duration_milliseconds_sum{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 25597.877786
http_server_duration_milliseconds_count{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 2
# HELP http_server_request_size_bytes_total Measures the size of HTTP request messages.
# TYPE http_server_request_size_bytes_total counter
http_server_request_size_bytes_total{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 202
# HELP http_server_response_size_bytes_total Measures the size of HTTP response messages.
# TYPE http_server_response_size_bytes_total counter
http_server_response_size_bytes_total{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 134

2) client metrics example

# HELP http_client_duration_milliseconds Measures the duration of outbound HTTP requests.
# TYPE http_client_duration_milliseconds histogram
http_client_duration_milliseconds_bucket{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="5000"} 2
...
http_client_duration_milliseconds_bucket{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="0"} 0
...
# HELP http_client_request_size_bytes_total Measures the size of HTTP request messages.
# TYPE http_client_request_size_bytes_total counter
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 202
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 3670
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="reranking-svc.chatqa.svc.cluster.local",net_peer_port="8000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 14076
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="retriever-svc.chatqa.svc.cluster.local",net_peer_port="7000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 19498
# HELP http_client_response_size_bytes_total Measures the size of HTTP response messages.
# TYPE http_client_response_size_bytes_total counter
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 19422
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 480
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="reranking-svc.chatqa.svc.cluster.local",net_peer_port="8000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 3640
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="retriever-svc.chatqa.svc.cluster.local",net_peer_port="7000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 13972

3) custom metrics example

# HELP llm_first_token_latency_milliseconds Measures the first token generate duration of.
# TYPE llm_first_token_latency_milliseconds histogram
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="1"} 0
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="64"} 0
...
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="+Inf"} 2
llm_first_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 9550.203300000001
llm_first_token_latency_milliseconds_count{otel_scope_name="entrag-telemetry",otel_scope_version=""} 2
# HELP llm_next_token_latency_milliseconds Measures the next token generate duration of.
# TYPE llm_next_token_latency_milliseconds histogram
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="1"} 4
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="64"} 4
...
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="+Inf"} 36
llm_next_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 14365.578998999998
llm_next_token_latency_milliseconds_count{otel_scope_name="entrag-telemetry",otel_scope_version=""} 36

4) Prometheus client default Go language metrics:

skl212 ➜  ~ curl -sL -N http://127.0.0.1:8080/metrics G go_ | grep -v HELP | grep -v TYPE
go_gc_duration_seconds{quantile="0"} 9.7515e-05
go_gc_duration_seconds{quantile="0.25"} 0.000102628
go_gc_duration_seconds{quantile="0.5"} 0.000134259
go_gc_duration_seconds{quantile="0.75"} 0.000141445
go_gc_duration_seconds{quantile="1"} 0.000191476
go_gc_duration_seconds_sum 0.00130944
go_gc_duration_seconds_count 10
go_goroutines 8
go_info{version="go1.22.4"} 1
go_memstats_alloc_bytes 2.795128e+06
....
go_memstats_sys_bytes 2.276456e+07
go_threads 23

Issues

n/a

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

  • open telemetry go sdk (+prometheus exporter)
    • go.opentelemetry.io/otel
    • go.opentelemetry.io/otel/exporters/prometheus
    • go.opentelemetry.io/otel/metric
    • go.opentelemetry.io/otel/sdk/metric
    • go.opentelemetry.io/otel/metric
  • prometheus client:
    • github.com/prometheus/client_golang/prometheus/promhttp
  • open telemetry go contrib http client/server intrumentation
    • go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp

Tests

Manually using curl (examples show above) after running only with ChatQnA sample graph.

TODO:

  • better or new label and attributes (e.g. otel_scope_name...)
  • better buckets boundries for token latencies
  • reconfigure default bucket boundries for server/client latnecy metrics
  • test OTEL trace propagation
  • write unit tests
  • document
  • remove some comments with examples

@ppalucki ppalucki force-pushed the ppalucki-telemetry-poc branch from 9bf7090 to d15ac8c Compare August 12, 2024 23:42
// bag := baggage.FromContext(ctx)
// span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))

_, _ = io.WriteString(w, "Hello, world!\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ppalucki, what's the purpose of this response?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhlsunshine I accidently copied it from some working branch of streaming, later rebased - now it is removed.

// span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))

_, _ = io.WriteString(w, "Hello, world!\n")
ctx, cancel := context.WithTimeout(req.Context(), 3600*time.Minute)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ppalucki, may 1 hour time out be too long?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhlsunshine fixed (long timeout was used it for my debugging sessions)

@zhlsunshine
Copy link
Collaborator

Hi @ppalucki, thanks for your effort on this PR!
In mcGraphHandler :

responseBody, statusCode, err := routeStep(defaultNodeName, *mcGraph, inputBytes, inputBytes, req.Header)

I think there should be a measurement after routeStep returns the responseBody because this produce contains the duration for going through all microservice in the pipeline. And this information can calculate and evaluate the handling time for GMC router. It really is helpful for GMC router.

I think you also can support the measurement metrics in mcDataHandler which is focusing on data embedding and storage in VectorDB.

//mux.Handle("/dataprep", otelhttp.NewHandler(http.HandlerFunc(mcDataHandler), "mcDataHandler"))

// Export metric using Prometheus client built-in handler.
mux.Handle("/metrics", promhttp.Handler())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ppalucki, is it possible to registry this router metrics via function handleFunc as well?

Copy link
Author

@ppalucki ppalucki Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhlsunshine fixed (added)
yes, it is possible, initally I didn't want to include /metrics in stats

but you're right - it should be covered and excluded later in visualization/post processing

@ppalucki
Copy link
Author

ppalucki commented Aug 14, 2024

Hi @ppalucki, thanks for your effort on this PR! In mcGraphHandler :

responseBody, statusCode, err := routeStep(defaultNodeName, *mcGraph, inputBytes, inputBytes, req.Header)

I think there should be a measurement after routeStep returns the responseBody because this produce contains the duration for going through all microservice in the pipeline. And this information can calculate and evaluate the handling time for GMC router. It really is helpful for GMC router.

I think you also can support the measurement metrics in mcDataHandler which is focusing on data embedding and storage in VectorDB.

I quickly added:

  • all_token_latency - which is essenatialy sum of first and "all" next tokens,
  • pipeline_latency and what you asked for but still not sure that it covers what you wanted e.g. "user facing" latency of all steps in pipeline + network.

I need to test it more carefully.

here is example output:

llm_all_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 103565.76762
llm_first_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 38418.945988
llm_next_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 59319.08985300002
llm_pipeline_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 6454.966733

@ppalucki ppalucki force-pushed the ppalucki-telemetry-poc branch 3 times, most recently from 735d3be to d1f3b04 Compare September 3, 2024 12:31
Small issues and also collect stats from /metrics router
remove examples + add allTokenLatency metric
Fix description of histograms
pipeline latency histogram before token measurments
@ppalucki ppalucki force-pushed the ppalucki-telemetry-poc branch from d1f3b04 to 5548dfd Compare September 4, 2024 06:38
@zhlsunshine
Copy link
Collaborator

Hi @ppalucki, I think you may need to fix the DCO error by adding the Signed-off-by with your github id and email.

provider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter))
otel.SetMeterProvider(provider)

// ppalucki: Own metrics defintion bellow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos:

Suggested change
// ppalucki: Own metrics defintion bellow
// ppalucki: Own metrics definition below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

Comment on lines +95 to +100
var (
firstTokenLatencyMeasure metric.Float64Histogram
nextTokenLatencyMeasure metric.Float64Histogram
allTokenLatencyMeasure metric.Float64Histogram
pipelineLatencyMeasure metric.Float64Histogram
)
Copy link
Contributor

@eero-t eero-t Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this would look nicer as:

Suggested change
var (
firstTokenLatencyMeasure metric.Float64Histogram
nextTokenLatencyMeasure metric.Float64Histogram
allTokenLatencyMeasure metric.Float64Histogram
pipelineLatencyMeasure metric.Float64Histogram
)
type struct LatencyMeasureT {
firstToken, nextToken, allTokens, pipeline metric.Float64Histogram
}
var latencyMeasure latencyMeasureT

Comment on lines +118 to +155
firstTokenLatencyMeasure, err = meter.Float64Histogram(
"llm.first.token.latency",
metric.WithUnit("ms"),
metric.WithDescription("Measures the duration of first token generation."),
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
)
if err != nil {
log.Error(err, "metrics: cannot register first token histogram measure")
}
nextTokenLatencyMeasure, err = meter.Float64Histogram(
"llm.next.token.latency",
metric.WithUnit("ms"),
metric.WithDescription("Measures the duration of generating all but first tokens."),
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
)
if err != nil {
log.Error(err, "metrics: cannot register next token histogram measure")
}

allTokenLatencyMeasure, err = meter.Float64Histogram(
"llm.all.token.latency",
metric.WithUnit("ms"),
metric.WithDescription("Measures the duration to generate response with all tokens."),
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
)
if err != nil {
log.Error(err, "metrics: cannot register all token histogram measure")
}

pipelineLatencyMeasure, err = meter.Float64Histogram(
"llm.pipeline.latency",
metric.WithUnit("ms"),
metric.WithDescription("Measures the duration to going through pipeline steps until first token is being generated (including read data time from client)."),
api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
)
if err != nil {
log.Error(err, "metrics: cannot register pipeline histogram measure")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of duplication. Maybe this could iterate over slice like (untested code):

metrics := struct {
  metric *metric.Float64Histogram
  name string
  desc string
}[] = {
   { &latencyMeasure.firstToken, "llm.first.token.latency", "..." },
   ...
}

for _, item := metrics {
  item.metric, err = meter.Float64Histogram(
    item.name,
    metric.WithUnit("ms"),
    metric.WithDescription(item.desc),
    api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
  )
  if err != nil {
    log.Errorf(err, "metrics: cannot register '%s' histogram measure", item.name)
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@eero-t
Copy link
Contributor

eero-t commented Nov 6, 2024

See also: opea-project/GenAIComps#845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants